Extracting causation from millennial-scale climate fluctuations in the last 800 kyr

The detection of cause-effect relationships from the analysis of paleoclimatic records is a crucial step to disentangle the main mechanisms at work in the climate system. Here, we show that the approach based on the generalized Fluctuation–Dissipation Relation, complemented by the analysis of the Transfer Entropy, allows the causal links to be identified between temperature, CO\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_2$$\end{document}2 concentration and astronomical forcing during the glacial cycles of the last 800 kyr based on Antarctic ice core records. When considering the whole spectrum of time scales, the results of the analysis suggest that temperature drives CO\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_2$$\end{document}2 concentration, or that are both driven by the common astronomical forcing. However, considering only millennial-scale fluctuations, the results reveal the presence of more complex causal links, indicating that CO\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_2$$\end{document}2 variations contribute to driving the changes of temperature on such time scales. The results also evidence a slow temporal variability in the strength of the millennial-scale causal links between temperature and CO\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_2$$\end{document}2 concentration.


Derivation of the generalized FDR
In this section we briefly sketch the derivation of formula (2) of the main text. A more detailed exposition can be found in Ref. 1 . Let x(t) = [x 1 (t), . . . , x n (t)] be a multivariate Markov process of dimension n, whose stationary PDF p st (x) is smooth and nonvanishing. We want to understand the effect on the dynamics x(t) of an instantaneous small perturbation ε = (ε 1 , . . . , ε n ) performed at time t = 0, by measuring the displacement it generates on the averages where the first term corresponds to the perturbed dynamics and the second to the original (unperturbed) one. The average should be interpreted as carried out over many realizations of the perturbed and the original dynamics. The analytical computation of Eq.(S1) requires the knowledge of the joint probabilities of x(t) and x 0 , that for a Markov process can be expressed as where as the perturbation involves only the initial condition, the probability density of the perturbed system is nothing but a rigid shift of the invariant distribution of the unperturbed system; for example, in the scalar case if p st (x 0 ) = 1/ √ 2πσ 2 exp[−x 2 /(2σ 2 )] is a Gaussian with zero average, the perturbation will bring the system in a Gaussian distribution with average ε, i.e. 1/ √ 2πσ 2 exp[−(x −ε) 2 /(2σ 2 )]. Moreover, because the perturbation affects only initial states, the evolution rule in unchanged thus both systems will share the same transition probability (propagator) W (x,t | x 0 ), see Ref. 1 . So we can write, Since |ε| 1, the above expression can be expanded to the first order in |ε|, leading to the formula Then the response function of the generalized (FDR) reads where the average · is computed on the unperturbed system, and R t is the matrix of the linear response functions. The above equation is valid if the system admits a (sufficiently smooth) invariant distribution, p st (x). When the response formula (S4) is applied to a linear Markov process, namely a Ornstein-Uhlenbeck process (OUP), we obtain exactly Eq.(3) of the main text. The OUP is defined by the evolution where A and B are constant n × n matrices, and η(t) is an array of n independent delta-correlated Gaussian noises. The stationary PDF of the OUP, that is required in Eq.(S4), is known to be 2 thus independent of time. Now by applying Eq.(S4), we straightforwardly obtain that is exactly the formula (3) of the main text, upon realizing that Σ = C(0). Eq. (S4) expresses the link among responses and correlators of every orders, it allows the response functions to be computed from suitable correlation functions, provided that the functional form of P s (x) is known either a priori or inferred from data (often a completely non-trivial task).

Evaluation of the error
In this section we provide an esteem of the error associated with the filtering procedure discussed in the main text. In short, we compute the correlation functions of the filtered process showing that they are equal to those of the fast variables defined in the Methods section, but for an additive constant which depends on the time scale separation between the fast dynamics, the slow dynamics and the filtering window T w .
We start by noticing that Eq. (12) of the main text implies The slow-variable evolution is ruled by Eq. (7) of the main text, whose complete solution is The first term on the right hand side is a transient, whose contribution becomes negligible as soon as t is larger than a few τ 0 . To evaluate the integral, let us remember that f (t) is a slow-varying function, so that f (s) as a consequence, the third term of the right hand side of Eq. (S7) reads

2/10
where also Eq. (5) of the main text has been exploited. The filtered variables can thus be approximated as We want to estimate the correlation functions appearing in the generalized FDR for linear dynamics. In the light of the above, one has The product in the average in the r.h.s. leads to four terms, one of which is x F (t)x T F (t ) . We have to show that the remaining ones are negligible.
To estimate these terms it is important to remind the properties of the matrix A and the fact that the dynamics x F is given by Eq. (8) of the main text, so that 2 In the above reasoning we have exploited our hypothesis about the diagonalizability of the matrix A. This condition simplifies the computation, as it allows to write the exponential of the matrix in a simple way. Let us stress, however, that a similar calculation could be also carried out for generic invertible matrixes, by making use of the Jordan normal form to compute the exponentials. The above integrals are now diagonal matrices, and we can evaluate the j-th diagonal element as where a j D is the j-th diagonal element of the matrix A D , and we have exploited the asymptotic expansion erfc(x) e −x 2 /(x √ π), valid for x 1. Since we are indeed interested in the time-range in which t − t is at most O(T w ), and recalling that the spectral radius of A is order τ −1 0 , we can conclude that the above terms are at most O(τ 0 /T w ). As a consequence, Reasoning in the same way for the remaining terms one obtains The sources of error in the approximation are therefore two. To minimize the error, one has to choose T w of the order of T w = (τ 0 τ 2 1 ) 1/3 .

Numerical examples
To illustrate how the above proposed combination of filtering and GFDR works in practice, we consider some pedagogical examples based on suitably defined toy models.

A toy model with linear fast dynamics
Let x be the two-dimensional system defined by Eq. (4) of the main text, where A is the 2 × 2 matrix and the forcing reads The above dynamics can be easily simulated with the standard Euler-Maruyama algorithm 3 , and the proposed analysis can be applied to the numerically generated trajectories. The outcomes are shown in Figure S1. In panel (a) the solid lines represent the actual response functions, computed according to the definition, as in Eq. (1) of the main text (an average over a large number of realizations is considered). The dashed lines are obtained by naively applying the formula of generalized FDR valid for linear systems, Eq. (3) of the main text, to the dynamics of x, which is neither Markovian nor linear. Finally, points are obtained by applying the generalized FDR to the properly filtered variables. This latter procedure is in good agreement with the results obtained by direct measure of response, as predicted by our analytical argument., while the naive analysis without filtering leads to quite misleading results. Panel (b) shows the relative error of correlation functions and GFDR as a function of T w , the width of the Gaussian filter. As expected, the best results are obtained for T w close to (τ 0 τ 2 1 ) 1/3 .

Inclusion of nonlinear terms
The above argument is valid when the interactions between the variables are linear; however one may study what happens when a nonlinear interaction term is added. It is indeed known that FDR is quite robust with respect to the addition of small nonlinear perturbations of the dynamics. In Fig. S2 we study model (S18) wherė i.e. with the addition of the underlined nonlinear term. As the figure shows, the presence of such nonlinear perturbation does not alter in a relevant way the ability of the filtering method to reproduce the actual response from long time series of data.

Lorenz '63 forcing
A different kind of test consists in modifying the forcing with some less regular function. To this end, we use the first component of the Lorenz '63 model with the usual choice of the variables σ = 10, β = 8/3, ρ = 28. The results are shown in Fig. S3.
where |M| stands for the determinant of the matrix M. It is also useful to remember that the covariance matrix of a conditioned Gaussian distribution verifies where Σ x,y is the covariance matrix of the joint distribution. Keeping this in mind, for the linear cases one can rewrite Eq. (18) of the main text as Taking into account Eq. (S23), the above expression can be reduced into

Further methodological remarks
To apply the proposed method to real situations, it is required that synchronised and equally time-spaced data are available, so that lagged cross-correlations can be suitably computed. To this aim, the original data series have to be interpolated, as discussed in the Method section of the main text. Of course this operation may introduce subtle sources of error. For instance, if the frequency of the interpolated data is larger than the original one, spurious correlations may be introduced, basically due to repeated use of the same data to generate the new points. It is thus important to carefully check that the interpolation procedure does not alter the results. Figure 2 in the Methods section already shows that the temporal resolution of the temperature and CO 2 records is different and varies with time. Another way to further verify whether a non-homogeneously time-spaced dataset can be approximated by an interpolated series with a given lag relies on the analysis of the data population along the sample. The idea is to divide the total time interval into smaller segments and to count the number of data falling in each of them. The statistics of the data populations allows to determine how reliable the interpolation is, for a given value of the lag. If, for instance, most segments show a density of points which is lower than the inverse of the lag, this clearly means that in many parts of the dataset we are creating more points than the original ones. Note that this criterion is much stricter than simply considering the average lag between two data in the original series, as it takes into account the possibility of large fluctuations of the data density along the time series.
A graphical illustration of this analysis is shown in Fig. S4. From the figure it is clear that the [CO 2 ] dataset is the most critical, as the amount of segments containing only few points is larger. To be more quantitative, about 80% of the considered intervals are populated by less than 20 points, meaning that the results obtained with a lag of 0.5 kyr (the first non-zero lag value in our plots) may be affected by the spurious correlations produced by the interpolation, as stated in the Method section of the main text. On the other hand, only 25% of the intervals contain less than 10 data points, meaning that the results at 1 kyr lag and larger are quite reliable under this respect.
To avoid potential issues emerging from the different resolution of the temperature and CO 2 signals, in the analysis we opted for degrading the resolution of the temperature signal to that of the CO 2 concentration (and vice-versa, in the few intervals 6/10 where the temporal resolution of [CO 2 ] is larger than that of T ), see Fig. 1 in the main text. Here, in Fig. S5 we show the results that would be obtained without such degradation of the resolution. In such case, the effect of [CO 2 ] on temperature becomes almost twice that of the opposite link, indicating that the estimated strength of the causal links can depend on many details, including differences in resolution. Clearly, such issues are amplified by the limited number of points available for the analysis.
In any case, the main message of these results -that is, mutual causal relationships between CO 2 concentration and temperature, which would be lost without high-pass filtering -does not change. An interesting test is to repeat the analysis by considering only half of the dataset, to verify whether there is a temporal dependence of the causal relationships between temperature and CO 2 concentration. Here, we consider either the first or the second half of the record, namely the data referring to either the first or the last 400 kyr. Of course, in this way the statistical uncertainty increases, as we are considering a more limited amount of data. Also in these cases, the high-pass filter allows to detect causal links on the scale of 1 kyr, which are hidden by the slow dynamics when performing analyses on unfiltered data. After applying the filtering procedure, the [CO 2 ] → T causal link appears to dominate in the first 400 kyr, as shown in Fig. S6, while it becomes weaker than the reverse link in the last 400 ky, as shown in Fig. S7. This result can just be a statistical fluctuation generated by the limited amount of data, or it could point to a role of the temporal resolution in modulating the causal relationships of CO 2 and T , or else it could reflect a true temporal variability in the strength of the causal links between the two variables. In any case, all these results confirm the presence of millenial-scale mutual causal relationships between CO 2 and temperature, which could not be detected in the unfiltered record.  Figure S6. Analysis of the causal relations between T and [CO 2 ] from paleoclimate data, using a spline interpolation and considering only the first 400 kyr of the record. As before, the cross terms of the generalized FDR [panels (a) and (b)], as well as the corresponding transfer entropies [panels (c) and (d)] are shown. Each quantity has been computed before and after the high-pass filtering procedure of the data series.  Figure S7. Analysis of the causal relations between T and [CO 2 ] from paleoclimate data, using a spline interpolation and considering only the last 400 kyr. As before, the cross terms of the generalized FDR [panels (a) and (b)], as well as the corresponding transfer entropies [panels (c) and (d)] are shown. Each quantity has been computed before and after the high-pass filtering procedure of the data series.